Using J-Pruning to Reduce Overfitting of Classification Rules in Noisy Domains

نویسنده

  • Max Bramer
چکیده

The automatic induction of classification rules from examples is an important technique used in data mining. One of the problems encountered is the overfitting of rules to training data. This paper describes a means of reducing overfitting known as J-pruning, based on the J-measure, an information theoretic means of quantifying the information content of a rule, and examines its effectiveness in the presence of noisy data for two rule induction algorithms: one where the rules are generated via the intermediate representation of a decision tree and one where rules are generated directly from examples.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Pre-pruning Classification Trees to Reduce Overfitting in Noisy Domains

The automatic induction of classification rules from examples in the form of a classification tree is an important technique used in data mining. One of the problems encountered is the overfitting of rules to training data. In some cases this can lead to an excessively large number of rules, many of which have very little predictive value for unseen data. This paper describes a means of reducin...

متن کامل

Jmax-pruning: A facility for the information theoretic pruning of modular classification rules

The Prism family of algorithms induces modular classification rules in contrast to the Top Down Induction of Decision Trees (TDIDT) approach which induces classification rules in the intermediate form of a tree structure. Both approaches achieve a comparable classification accuracy. However in some cases Prism outperforms TDIDT. For both approaches pre-pruning facilities have been developed in ...

متن کامل

Using J-pruning to reduce overfitting in classification trees

The automatic induction of classification rules from examples in the form of a decision tree is an important technique used in data mining. One of the problems encountered is the overfitting of rules to training data. In some cases this can lead to an excessively large number of rules, many of which have very little predictive value for unseen data. This paper is concerned with the reduction of...

متن کامل

J-measure Based Hybrid Pruning for Complexity Reduction in Classification Rules

Prism is a modular classification rule generation method based on the ‘separate and conquer’ approach that is alternative to the rule induction approach using decision trees also known as ‘divide and conquer’. Prism often achieves a similar level of classification accuracy compared with decision trees, but tends to produce a more compact noise tolerant set of classification rules. As with other...

متن کامل

Induction of Modular Classification Rules: Using Jmax-pruning

The Prism family of algorithms induces modular classification rules which, in contrast to decision tree induction algorithms, do not necessarily fit together into a decision tree structure. Classifiers induced by Prism algorithms achieve a comparable accuracy compared with decision trees and in some cases even outperform decision trees. Both kinds of algorithms tend to overfit on large and nois...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002